Towards Automatic Topical Classification of LOD Datasets
نویسندگان
چکیده
The datasets that are part of the Linking Open Data cloud diagramm (LOD cloud) are classified into the following topical categories: media, government, publications, life sciences, geographic, social networking, user-generated content, and cross-domain. The topical categories were manually assigned to the datasets. In this paper, we investigate to which extent the topical classification of new LOD datasets can be automated using machine learning techniques and the existing annotations as supervision. We conducted experiments with different classification techniques and different feature sets. The best classification technique/feature set combination reaches an accuracy of 81.62% on the task of assigning one out of the eight classes to a given LOD dataset. A deeper inspection of the classification errors reveals problems with the manual classification of datasets in the current LOD cloud.
منابع مشابه
Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources
Datasets in the LOD cloud are far from being static in their nature and how they are exposed. As resources are added and new links are set, applications consuming the data should be able to deal with these changes. In this paper we investigate how LOD datasets change and what sensible measures there are to accommodate dataset dynamics. We compare our findings with traditional, document-centric ...
متن کاملObject-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کاملRoomba: Automatic Validation, Correction and Generation of Dataset Metadata
Data is being published by both the public and private sectors and covers a diverse set of domains ranging from life sciences to media or government data. An example is the Linked Open Data (LOD) cloud which is potentially a gold mine for organizations and individuals who are trying to leverage external data sources in order to produce more informed business decisions. Considering the significa...
متن کاملBreast Cancer Diagnosis from Perspective of Class Imbalance
Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the ...
متن کاملAdoption of the Linked Data Best Practices in Different Topical Domains
The central idea of Linked Data is that data publishers support applications in discovering and integrating data by complying to a set of best practices in the areas of linking, vocabulary usage, and metadata provision. In 2011, the State of the LOD Cloud report analyzed the adoption of these best practices by linked datasets within different topical domains. The report was based on information...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015